DATA SCIENCE, ARCHITECTURE, AI and ML Part 1

I. An Overview on Data Science, AI, & ML

This series is an overview on aspects Data Science, AI, & ML presented in HTML format with code embedded within. Any audience could grasp understanding from this presentation, and is intended to be helpful for a variety of different purposes. Having said the above please note this implementation features a modern multi-lingual approach using R Python, Julia, Rust, Java, Scala & SQL. In addition CPU, and parallel (CPU, GPU, & Quantum) computational architectures are implemented through wrappers to C, C++, & Cuda code. Their is a focus on opensource technologies, using Onyx, h2o_ai, & Spark for example, but also aspects of dealing with cloud computing (Azure, Amazon, Google). To understand the ins and outs of coding particulars inherently requires expertise in that particular domain. For those who are researching or looking to learn on their own, reproducible coding & data conventions have been implemented & the embedded code is well commented. Referencing is performed through html-links allowing for easy click & go sourcing. The series is available to all on Github, according to the platform licensing & conduct policies

About Me

A quick download about me for the curious. I am an Analytics Professional. Meaning I have worn the hats like Chief of Analytics/Sr. Principal Consultant or Sr. Data Architect / Sr. Solutions Architect for signature projects with budgets in dollars of 8, 9,or 10 figures, for entities like at&t, Center Point Energy, some marketing agencies under the Omnicom Umbrella, as well as federal Government with the VA, the Dept. of Health & Human Services, and even a special cabinet committee project for the white house (along with the required security clearances those entail) My educational background is:

  • B.S. in Mathematics from University of Texas in Probability, Statistics & Data Analysis & minors in Actuarial Science & Business Foundations.
  • MBA from St. Edwards University in Executive & Operations Management.
  • Graduated with Honors from the Harvard Business Analytics Program.

I got my start in consulting as an estimator for Drees Custom Homes, the largest private builder in the US, where I was matrix-ed out to the VP of Corporate Operations to modernize organizational workflows through innovation & technology in their 12 divisions nationwide. I helped at&t launch U-verse in one the first mega-budget projects to be managed end to end with analytics at that time. For Center Point Energy, there was a huge smart grid project , funded by the DOE, the State of Texas, the city of Houston & CNP. When it was all said & done my firm had built from the ground up, delivered, & transferred their operational analytics program for electricity distribution & Smart City Management. The technology was combo wireless mesh device network & WiMax system with significant OTA capabilities serving the fourth largest city in the country. No doubt, I was influenced by my father who was a Computer Scientist & my mother who was both an an engineer & Phd. in Neuroscience. Now let’s get started.

A Data Science Mind Map
A Data Science Mind Map

Data science is an interdisciplinary field that combines math, statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning. Its purpose is to uncover actionable insights hidden within data, usually with a data to decisions perspective.  These insights guide decision-making and strategic planning12In essence, data science examines large amounts of data to reveal hidden patterns, generate insights, and inform choices3

  • Data science is “a concept to unify statistics, data analysis, informatics, and their related methods” to “understand and analyze actual phenomena” with data.[5] It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge.[6] However, data science is different from computer science and information science. Turing Award winner Jim Gray imagined data science as a “fourth paradigm” of science (empirical, theoretical, computational, and now data-driven) and asserted that “everything about science is changing because of the impact of information technology” and the data deluge.[7][8]

  • Big Data refers to extremely large and diverse collections of structured, unstructured, and semi-structured data that continues to grow exponentially over time. These data sets are so huge and complex in volume, velocity, and variety, that traditional data management systems cannot store, process, and analyze them. 

  • The amount and availability of data is growing rapidly, spurred on by digital technology advancements, such as connectivity, mobility, the Internet of Things (IoT), and Artificial Intelligence (AI). As data continues to expand and proliferate, new big data tools are emerging to help companies collect, process, and analyze data at the speed needed to gain the most value from it. 

  • A Data Scientist is a professional who creates programming code, applications, or platforms through implementing AI/ML/Modeling or Statistical knowledge to identify & deliver insights from data, or to ascertain capabilities which provide a competitive edge, improve profitability/efficiency, or to deliver on a product, project, program, or service quality or experience objectives. .[9]

  • Data Architect is a practitioner of data architecture, a data management discipline concerned with designing, creating, deploying and managing an organization’s data architecture. Data architects define how the data will be acquired, transported, stored, consumed, integrated and managed by different data consuming entities and IT systems (The Customer), as well as any applications systems interacting with said data in some way.[1] It is closely allied with business architecture and is considered to be one of the four domains of enterprise architecture.

  •  A CTO, CIO, CAO, CDO, CDsO, VP or Director of Analytics / Decision Support is a senior leader who oversees the comprehensive data/technology strategy and associated operations of an organization. Responsibilities include setting the vision, goals, and standards for data acquisition/management plan , as well as leading & managing teams of analysts, data scientists, architects and engineers. Organizations compile, acquire , research and use data to solve problems, optimize profit/efficiency, to minimize turnover/churn, or to streamline logistics, for the purpose of making better business decisions. They also assess the complex information operating environment including a host of forecasts, financials, market/product research, organizational & external human interactivity , with the goal of reaching simpler, faster, better, smarter decision data sets to improve performance or to develop new capabilities to attain or maintain a competitive advantage. Their technical expertise is paramount to stress test, the technological data processes for quality, punctuality, accuracy, & security, while exploiting & exploring in parallel.

A Data Architecture Graph
A Data Architecture Graph

AI / ML - How did we get here?

The theoretical base for contemporary neural networks was independently proposed by Alexander Bain in 1873[6] and William James in 1890.[7] Both posited that human thought emerged from interactions among large numbers of neurons inside the brain. In 1949, Donald Hebb described Hebbian learning, the idea that neural networks can change and learn over time by strengthening a synapse every time a signal travels along it.[8]

Artificial neural networks were originally used to model biological neural networks starting in the 1930s under the approach of connectionism. However, starting with the invention of the perceptron, a simple artificial neural network, by Warren McCulloch and Walter Pitts in 1943,[9] followed by the implementation of one in hardware by Frank Rosenblatt in 1957,[3] artificial neural networks became increasingly used for machine learning applications instead, and increasingly different from their biological counterparts.

Modern Approaches

In 2012, technologists Thomas H. Davenport and DJ Patil declared “Data Scientist: The Sexiest Job of the 21st Century”,[26] a catchphrase that was picked up even by major-city newspapers like the New York Times[27] and the Boston Globe.[28] A decade later, they reaffirmed it, stating that “the job is more in demand than ever with employers”.[29]

This is an R markdown document, it is a simple and easy to use plain text language that combines code, data, results from data operations (including plots and tables, pipelines, SP’s, inputs & outputs) with commentary l combined into a single nicely formatted and reproducible document.

  • The series will be self contained in an RENV environment

  • I will use a specific data set for each part in the series, for For Part I will stick with the Iris data set so the reader

The Iris flower data set or Fisher’s Iris data set is a multivariate data set used and made famous by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. The data set consists of 50 samples from each of three species of Iris (Iris setosaIris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other.

Here is a peek preview at a teaser sample chunk of code, using the Iris data set. The outputs are 2 visuals, consider what makes them alike & what differentiates them.
#  Intake ==> ETL ==> [Training Data_idx]

# The Well Known Iris Data Setis provided by R natively
data(iris)
iris$setosa <- iris$Species == "setosa"
iris$virginica <- iris$Species == "virginica"
iris$versicolor <- iris$Species == "versicolor"
iris.train.idx <- sample(x = nrow(iris), size = nrow(iris)*0.5)
iris.train <- iris[iris.train.idx,]
iris.valid <- iris[-iris.train.idx,]

# Pipe: data[]===> Neural_Net_Data-Model_Structure {NN_DM}

library(neuralnet)
iris.net <- neuralnet(setosa + versicolor + virginica ~
  Sepal.Length + Sepal.Width +  Petal.Length + Petal.Width, 
    data = iris.train,
     hidden = c(10,10), 
        err.fct = "ce", 
          linear.output = F, 
            lifesign = "minimal",
              stepmax = 1000000,
                threshold = 0.001,
                   rep = 5, )
## hidden: 10, 10    thresh: 0.001    rep: 1/5    steps:     229    error: 0.00019  time: 0.1 secs
## hidden: 10, 10    thresh: 0.001    rep: 2/5    steps:     425    error: 8e-05    time: 0.23 secs
## hidden: 10, 10    thresh: 0.001    rep: 3/5    steps:     500    error: 6e-05    time: 2.65 secs
## hidden: 10, 10    thresh: 0.001    rep: 4/5    steps:     329    error: 0.00013  time: 0.27 secs
## hidden: 10, 10    thresh: 0.001    rep: 5/5    steps:     426    error: 0.00026  time: 0.25 secs
# Provide a Visual [{NN_DM}]====>{Graph_/Out}

thematic::thematic_on(bg = 'black' , fg = 'white', accent = 'lightgreen' )
NN <- suppressWarnings( NeuralNetTools::plotnet(iris.net ,  
    circle_cex  = 4,
     circle_col = "black",
      bord_col = "lightgreen",
       neg_col = "pink",
        pos_col = "lightgreen",
         cex_val = .75, 
          max_sp = TRUE,
           BIAS = TRUE,
            bias_y = .5,
             pad_x = 1.28,
              alpha = .1,
               rel_rsc = .75,
                alpha_val = .5))

# Predict Code F{NN_DM}(Predict)

iris.prediction <- compute(iris.net, iris.valid[-5:-8])

# Index F{NN_DM}(Predict_idx)
idx <- apply(iris.prediction$net.result, 1, which.max)

predicted <- c('setosa', 'versicolor', 'virginica')[idx]

# Tab Data Structure
NN_DM_Predict__ <- table(predicted, iris.valid$Species)

NN_DM_Predict__ 
##             
## predicted    setosa versicolor virginica
##   setosa         27          0         0
##   versicolor      0         23         0
##   virginica       0          3        22
(N=Nodes) 4_Input_N’s ||—–—> Bias_N1——->10_N’s-Hidden-L1——> Bias_N2—–>10_N’s Hidden_L2——> Bias_N’3 –—— >>3_Output_N’s
3 errors: classifying 1 versicolor as a virginica & classifying a 2 virginica’s as a versicolor’s, will revisit at the very end
library(GGally);library(viridis);library(hrbrthemes)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## Loading required package: viridisLite
# A Visual from the Iris Data Set
suppressWarnings( ggparcoord(
  iris,
   columns = 1:4, 
    groupColumn = 5, 
     order = "anyClass",
      scale = "globalminmax",
        showPoints = TRUE, 
          title = "No scaling",
            alphaLines = 0.3) + 
  
scale_color_viridis(discrete = TRUE) )

Visual: A classifier leverages patterns in data, ponder how the previous 2 visuals might relate to each other.

R Set Up

# options
options(warn = -1)
auto.snapshot <- getOption("renv.config.auto.snapshot")
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(renvfig.showtext = TRUE)

# Initial packages loaded 
librarian::shelf(
#DATA/PARSING/PIPING
library(viridis), library(magrittr), library(dplyr,quietly = T), library(showtext),
# HTML/RMARKDOWN 
library(rmdformats), library(bookdown), library(bslib), library(knitr), library(RColorBrewer),
library(hrbrthemes),
#PARRALELL/GPU/SYSTEM
library(parallelly), library(torch), library(benchmarkme), library(here),
#SPARK/h2o/Sarkling_Water
library(sparklyr), library(sparklyr.nested), library(sparktf), library(torch), library(arrow),
library(sparklyr), library(rsparkling),library(h2o), library(h2otools),
#VISUALS
library(plotly), library(ggplo2), library(ggforce), library(ggpubr), library(GGally),
#TEXT
library(text),
#PYTHON
library(reticulate),
#MODELING 
library(neuralnet), library(usemodels), library(C50), library(dials), library(embed), 
library(xgboost), library(Matrix),
library(tidymodels,quietly = T), library(caret,quietly = T))

PicPath = 'c:/Users/Administrator/Documents/R/NLP/udpipemodels/images_'

thematic::thematic_rmd()
thematic::thematic_on(bg = 'black' , fg = 'white', accent = 'lightgreen' )

# Implementing Timing Features 

knitr::opts_chunk$set(time_it = TRUE)
knitr::knit_hooks$set(time_it = local({
  now <- NULL
  function(before, options) {
    if (before) {
      # record the current time before each chunk
      now <<- Sys.time()
    } 
    else {
      # calculate the time difference after a chunk
      res <- difftime(Sys.time(), now, units = "secs")
      # return a character string to show the time
      paste("Time for this code chunk to run:", base::round(res,2), "seconds")
    }
  }
 }))
Here are my system specs, OS is Server 2022 on a Lenovo SR 635 Hyperscale DC EPYC CHIP
# Project
here::dr_here()

# System Architecture 
benchmarkme::get_ram()
## 223 GB
benchmarkme::get_cpu()
## $vendor_id
## [1] "AuthenticAMD"
## 
## $model_name
## [1] "AMD EPYC 7443P 24-Core Processor"
## 
## $no_of_cores
## [1] 48
# FROM TORCH
library(torch)

cuda_device_count()
## [1] 1
 cuda_current_device()
## [1] 0
   cuda_is_available()
## [1] TRUE
     cuda_runtime_version()
## [1] '11.8.0'
cuda_get_device_capability(device = cuda_current_device())
## Major Minor 
##     7     5

Time for this code chunk to run: 4.19 seconds

Let’s talk ML, AI, Neural Networks, & Deep Learning

The simplest machine learning algorithm is the Perceptron Algorithm, used to determine whether an input belongs to one class or another.

Example-#1, the Perceptron algorithm can determine the AND operator—given binary inputs x1 and x2, is (x1 AND x2) equal to 0? or 1?

In this example we have the classic binary operation, the logical AND [&].The Perceptron is the purple dashed line. Which has effectively parsed the 2 dimensional permutation space 0 ==> 1 with a decision boundary. Passing the purple line means you are one & not zero. This example, visually makes clear that, this is a linear class operator for sub-setting or classifying objects, events, or data structures. | *0 1 In 1957, Frank Rosenblatt “invented” a Perceptron program, on an IBM 704 computer at Cornell Aeronautical.

During that time, scientists had discovered that brain cells (Neurons) receive input from our senses by electrical signals.The Neurons, then again, use electrical signals to store information, and to make decisions based on previous input. The Neurons, then again, use electrical signals to store information, and to make decisions based on previous input. He correlated electrical stimulation with brain activity & learning. Frank, a rained psychologist, had the idea that his Perceptrons could simulate brain principles, with the ability to learn and make decisions.

In the context of machine learning, a neural network is an artificial mathematical model used to approximate nonlinear functions. While early artificial neural n which can be networks were physical machines,[3] today they are almost always implemented in software. Perceptrons & Neural Networks are forms of Supervised learning can be separated into two types of problems when data mining: classification and regression:

Classification problems use an algorithm to accurately assign test data into specific categories, such as separating apples from oranges. Or, in the real world, supervised learning algorithms can be used to classify spam in a separate folder from your inbox. Linear classifiers, support vector machines, decision trees and random forest are all common types of classification algorithms. Regression is another type of supervised learning method that uses an algorithm to understand the relationship between dependent and independent variables. Regression models are helpful for predicting numerical values based on different data points, such as sales revenue projections for a given business. Some popular regression algorithms are linear regression, logistic regression, and polynomial regression.

II . A Neural Net in Action

library(neuralnet); library(caret)

# Iris has 3 species & 4 attributes to model 
data("iris"); str(iris); set.seed(19)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#Intake ==> ETL ==> [Training Data_idx]
 
indexes = createDataPartition(iris$Species, p = .85, list = F)

train = iris[indexes, ] ; test = iris[-indexes, ] 

xtest = test[, -5] ; ytest = test[, 5]



# Pipe: data[]===> Neural_Net_Data-Model_Structure {NNet_DM}

nnet =  neuralnet(
                  Species~., 
                  train,
                  hidden = c(4,3),
                  linear.output = FALSE
                 )


# Provide a Visual [{NNet_DM}]====>{Graph_/Out}

plot(
     nnet,
     col.entry.synapse = "grey1", 
     col.entry = "brown",
     col.hidden = "lightgreen", 
     col.hidden.synapse = "grey",
     col.out = "red", 
     col.out.synapse = "black",
     col.intercept = "black",
     information = TRUE,
     fontsize = 9,
     arrow.length = .15,
     rep = "best"
     )

Time for this code chunk to run: 2.19 seconds

thematic::thematic_on(
                      bg = 'black' , 
                      fg = 'white', 
                      accent = 'lightgreen'
                     )

NN <- suppressWarnings( 
                        NeuralNetTools::plotnet(nnet, 
                                      circle_cex    = 4,
                                      circle_col = "black",
                                      bord_col = "lightgreen",
                                      neg_col = "pink",
                                      pos_col = "lightgreen",
                                      cex_val = .55, 
                                      alpha_val = .5)
                      )

Time for this code chunk to run: 0.74 seconds

The learning rate in machine learning is generally referred to by the hyper-parameter that determines the step size at each iteration while moving toward a minimum of a loss function. It controls how much the model’s weights are updated during training. An analytical method cannot be used to calculate the weights of a neural network. Instead, the weights must be discovered using stochastic gradient descent, an empirical optimization approach. In simpler terms, the stochastic gradient descent algorithm is used to train deep learning rate neural networks. A typical value for the learning rate is between 0.0 and 1.0. In actuality though, we have 2 kinds of learning or updating of our models.

  • Machine learnable parameters – The parameters that the algorithms learn/estimate on their own during training for a particular data set.

  • Hyper-parameters are variables that machine learning engineers or data scientists provide precise values to regulate how algorithms learn and modify the model’s performance.

An analytical method cannot be used to calculate the weights of a neural network. Instead, the weights must be discovered using stochastic gradient descent, an empirical optimization approach. In simpler terms, the stochastic gradient descent algorithm is used to train deep learning rate neural networks So what does Learning Rate look like?

library(ggplot2)
library(ggpubr)
library(plotly)

# Neural networks are neurons that are sequentially connected. Each of this sequence is called neuron layer.

set.seed(19)

circular <- function(x, R, centerX=0, centerY=0){
r = R * sqrt(runif(x))
theta = runif(x) * 2 * pi
x = centerX + r * cos(theta)
y = centerY + r * sin(theta)

z = data.frame(x = x, y = y)
return(z)
}

data1 <- circular(150,0.5)
data2 <- circular(150,1.5)

data1$Y <- 1
data2$Y <- 0
data_c <- rbind(data1,data2)


color <- c("yellow", "lightgreen")
plot(data_c, col = factor(data_c$Y))

rm(data1,data2, circular)

# Here is what we have so far:
library(ggplot2)
Plot1 <- ggplot(data_c ,aes(x,y, col = as.factor(Y))) + geom_point()

#Classification problem to solve with a neural network built from scratch in R
X <- as.matrix(data_c[,1:2])
Y <- as.matrix(data_c[,3])


# Its now time to create the structure of the neural network. To do so, we will implement a Class-Neuron which we can iterate through.
# This class will have the following parameters: Number of neurons on the layers 1,  Number of neurons on layer 2, the number of connections

neuron <- setRefClass(
                      "neuron",
                      fields = list(
                      fun_act = "list",
                      number_connections = "numeric",
                      number_neurons = "numeric",
                      W = "matrix",
                      b = "numeric"
                     ),
                     
 # Here is our activation function
methods = list(
initialize = function(
                      fun_act, 
                      number_connections,
                      number_neurons)
    {
      fun_act <<- fun_act
      number_connections <<- number_connections
      number_neurons <<- number_neurons
      
      W <<- matrix(runif(number_connections*number_neurons),nrow = number_connections)
     
       b <<- runif(number_neurons)
    }
  )
)


# Retrieve the activation functions & their derivatives ==> very useful when implementing gradient descent. 

# Sigmoid activation function and it’s derivative as follows:
sigmoid = function(x) 
  {
  y = list() 
  y[[1]] <- 1 / (1 + exp(-x))
  y[[2]] <- x * (1 - x)
  return(y)
  }

x <- seq(-5, 5, 0.01)
#s <- plot(x, sigmoid(x)[[1]], col = 'lightgreen')

# Relu activation function and it’s derivative as follows:
relu <- function(x)
  {
  y <- list()
  y[[1]] <- ifelse(x < 0,0,x)
  y[[2]] <- ifelse(x < 0,0,1)
  return(y)
  }

#r = plot(x, relu(x)[[1]], col = 'lightgreen')


# The weights for the incoming connections (W):, Bias(b): is added to every neuron after being weighted with W. 

n = ncol(X)                                     #Number of neurons Primary level 
capas = c(n, 4, 8, 1)                           # Number of neurons hidden level 
Function = list(sigmoid, relu, sigmoid)                   # Activation Functions
fuscia <- list()


for (i in 1:(length(capas) - 1))
  {
    fuscia[[i]] <- neuron$new(Function[i],capas[i], capas[i + 1])
  }


library(crayon)

cli::cli_alert_info("Class Definition: {fuscia}")
cat(green("\n  Class Instance created"));fuscia
## 
##   Class Instance created
## [[1]]
## Reference class object of class "neuron"
## Field "fun_act":
## [[1]]
## function(x) 
##   {
##   y = list() 
##   y[[1]] <- 1 / (1 + exp(-x))
##   y[[2]] <- x * (1 - x)
##   return(y)
##   }
## 
## Field "number_connections":
## [1] 2
## Field "number_neurons":
## [1] 4
## Field "W":
##            [,1]      [,2]      [,3]       [,4]
## [1,] 0.07396211 0.5202001 0.8658044 0.90293385
## [2,] 0.51024832 0.5226468 0.8529940 0.03439049
## Field "b":
## [1] 0.4044959 0.9406215 0.5616665 0.1888222
## 
## [[2]]
## Reference class object of class "neuron"
## Field "fun_act":
## [[1]]
## function(x)
##   {
##   y <- list()
##   y[[1]] <- ifelse(x < 0,0,x)
##   y[[2]] <- ifelse(x < 0,0,1)
##   return(y)
##   }
## 
## Field "number_connections":
## [1] 4
## Field "number_neurons":
## [1] 8
## Field "W":
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
## [1,] 0.2812473 0.6275187 0.4906343 0.9806989 0.9252840 0.7751652 0.2626343
## [2,] 0.3528488 0.5526032 0.4955991 0.9231114 0.5929445 0.8522817 0.7870782
## [3,] 0.1239218 0.4302792 0.5403289 0.7110307 0.3400002 0.3930699 0.9133510
## [4,] 0.6935446 0.3268714 0.7616799 0.7535075 0.2026201 0.2313034 0.1823203
##            [,8]
## [1,] 0.02728611
## [2,] 0.03334917
## [3,] 0.21608093
## [4,] 0.45169708
## Field "b":
## [1] 0.650707836 0.009437195 0.341280788 0.611787775 0.629912437 0.076977954
## [7] 0.020569250 0.242895248
## 
## [[3]]
## Reference class object of class "neuron"
## Field "fun_act":
## [[1]]
## function(x) 
##   {
##   y = list() 
##   y[[1]] <- 1 / (1 + exp(-x))
##   y[[2]] <- x * (1 - x)
##   return(y)
##   }
## 
## Field "number_connections":
## [1] 8
## Field "number_neurons":
## [1] 1
## Field "W":
##           [,1]
## [1,] 0.9752737
## [2,] 0.5134806
## [3,] 0.9814409
## [4,] 0.3123976
## [5,] 0.6806086
## [6,] 0.9050108
## [7,] 0.6584775
## [8,] 0.9161916
## Field "b":
## [1] 0.569508
fuscia
## [[1]]
## Reference class object of class "neuron"
## Field "fun_act":
## [[1]]
## function(x) 
##   {
##   y = list() 
##   y[[1]] <- 1 / (1 + exp(-x))
##   y[[2]] <- x * (1 - x)
##   return(y)
##   }
## 
## Field "number_connections":
## [1] 2
## Field "number_neurons":
## [1] 4
## Field "W":
##            [,1]      [,2]      [,3]       [,4]
## [1,] 0.07396211 0.5202001 0.8658044 0.90293385
## [2,] 0.51024832 0.5226468 0.8529940 0.03439049
## Field "b":
## [1] 0.4044959 0.9406215 0.5616665 0.1888222
## 
## [[2]]
## Reference class object of class "neuron"
## Field "fun_act":
## [[1]]
## function(x)
##   {
##   y <- list()
##   y[[1]] <- ifelse(x < 0,0,x)
##   y[[2]] <- ifelse(x < 0,0,1)
##   return(y)
##   }
## 
## Field "number_connections":
## [1] 4
## Field "number_neurons":
## [1] 8
## Field "W":
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
## [1,] 0.2812473 0.6275187 0.4906343 0.9806989 0.9252840 0.7751652 0.2626343
## [2,] 0.3528488 0.5526032 0.4955991 0.9231114 0.5929445 0.8522817 0.7870782
## [3,] 0.1239218 0.4302792 0.5403289 0.7110307 0.3400002 0.3930699 0.9133510
## [4,] 0.6935446 0.3268714 0.7616799 0.7535075 0.2026201 0.2313034 0.1823203
##            [,8]
## [1,] 0.02728611
## [2,] 0.03334917
## [3,] 0.21608093
## [4,] 0.45169708
## Field "b":
## [1] 0.650707836 0.009437195 0.341280788 0.611787775 0.629912437 0.076977954
## [7] 0.020569250 0.242895248
## 
## [[3]]
## Reference class object of class "neuron"
## Field "fun_act":
## [[1]]
## function(x) 
##   {
##   y = list() 
##   y[[1]] <- 1 / (1 + exp(-x))
##   y[[2]] <- x * (1 - x)
##   return(y)
##   }
## 
## Field "number_connections":
## [1] 8
## Field "number_neurons":
## [1] 1
## Field "W":
##           [,1]
## [1,] 0.9752737
## [2,] 0.5134806
## [3,] 0.9814409
## [4,] 0.3123976
## [5,] 0.6806086
## [6,] 0.9050108
## [7,] 0.6584775
## [8,] 0.9161916
## Field "b":
## [1] 0.569508
cat(yellow("\n This is a field in class neuron\n",neuron$fields(), "\n"))
## 
##  This is a field in class neuron
##  list 
##  
##  This is a field in class neuron
##  numeric 
##  
##  This is a field in class neuron
##  numeric 
##  
##  This is a field in class neuron
##  matrix 
##  
##  This is a field in class neuron
##  numeric
# create a new plotting window and set the plotting area into a 1*2 array
par(mfrow = c(1, 2))
suppressWarnings(plot(x, relu(x)[[1]], col = 'lightgreen', title = 'relu'))
suppressWarnings(plot(x, sigmoid(x)[[1]], col = 'lightgreen', title = 'rsigmoid'))

suppressWarnings(ggplotly(Plot1))
cli::cli_alert_success(cat(yellow("\n Thus, we have a S4 class fuscia neuron, next we initialize the parameters W and b with the function runif   ")))
## 
##  Thus, we have a S4 class fuscia neuron, next we initialize the parameters W and b with the function runif

Time for this code chunk to run: 1.95 seconds

Output: Be sure to see the class definition with data & its fields.

Activation Function
Activation Function

Activation / Loss Functions

Loss Function: A function that returns the cost associated with the model and measures how well our model is doing on the training data. If the cost is too high, it means that the predictions by our model are deviating too much from the observed data. In any machine learning algorithm, our ultimate mission is to minimize the loss function, sometimes referred to as the evolution of the error.

# Training the Neural Network: Forward Propagation
# When doing the forward pass we will store the results of the neuron before and after applying the activation functions (z).
#  This will be useful on back propagation.

set.seed(19)
conflicted::conflicts_prefer(base::`%*%`)

entry <- function(fuscia, X,Y, cost){
out = list()
out[[1]] <- append(list(matrix(0,ncol = 2,nrow = 1)), list(X))
 i = 0
  for (i in c(1:(length(fuscia)))) {
    z = list((out[[length(out)]][[2]] %*% fuscia[[i]]$W + fuscia[[i]]$b))
    a = list(fuscia[[i]]$fun_act[[1]](z[[1]])[[1]])
    out[[i + 1]] <- append(z,a)
  }
return(out)
}

cost <- function(Yp,Yr){
y <- list()
y[[1]] <- mean((Yp - Yr)^2)
y[[2]] <- (Yp - Yr)
return(y)

forward <- entry(fuscia, X,Y, cost)
head(forward[[4]][[2]]) 
}

#- Implemented a neural network ==> Examine Performance
#- Implementing front and back propagation w/ gradient descent on the same function

fuscia_neuronal <- function(red, X,Y, cost,lr = 0.05){
## Front Prop
out = list()
out[[1]] <- append(list(matrix(0,ncol = 4,nrow = 1)), list(X))
 i = 0
  for (i in c(1:(length( fuscia)))) 
    {
    z = list((out[[length(out)]][[2]] %*% fuscia[[i]]$W + fuscia[[i]]$b))
    a = list( fuscia[[i]]$fun_act[[1]](z[[1]])[[1]])
    out[[i + 1]] <- append(z,a)
    }

## Back-prop & Gradient Descent
delta <- list() 

for (i in rev(1:length( fuscia))) 
  {
  z = out[[i + 1]][[1]]
  a = out[[i + 1]][[2]]
  if (i == length( fuscia)) 
    {
    delta[[1]] <- cost(a,Y)[[2]] * fuscia[[i]]$fun_act[[1]](a)[[2]]
    } 
  else{
    delta <- list(delta[[1]] %*% W_temp * fuscia[[i]]$fun_act[[1]](a)[[2]],delta)
      }
  W_temp = t(fuscia[[i]]$W)
  fuscia[[i]]$b <-  fuscia[[i]]$b - mean(delta[[1]]) * lr
  fuscia[[i]]$W <-  fuscia[[i]]$W - t(out[[i]][[2]]) %*% delta[[1]] * lr
 }
return(out[[length(out)]][[2]])
}
  
 # Test neural network n = 2500
result <- fuscia_neuronal(fuscia, X,Y, cost)
dim(result)
## [1] 300   1
i_ = 0

for (i_ in seq(2500)) {
  Yt = fuscia_neuronal(fuscia, X,Y, cost, lr = 0.01)
  if (i_ %% 25 == 0) 
    {
    if (i_ == 25) 
      {
      iteration <- i_
      error <- cost(Yt,Y)[[1]]
      }
    else
      {
       iteration <- c(iteration,i_)
       error <- c(error,cost(Yt,Y)[[1]])      
      }
    }
}


# visualize how the error of our neural network has evolved:

thematic::thematic_on(bg = 'black' , fg = 'lightgreen', accent = 'pink' )
plot(error)

library(plotly)

thematic::thematic_on(bg = 'black' , fg = 'white', accent = 'lightgreen' )

Loss_model <-  plot_ly(x = iteration, 
                       y =  error, 
                       type = 'scatter', 
                       mode = 'lines',
                       line = list(color = 'lightgreen'))

  Loss_model |> 
    plotly::layout(paper_bgcolor = "black",
      title = "Error for the  Fuscia Nueronal",
       font = list(color = 'white'),
        plot_bgcolor = "grey2",
         xaxis = list(title = "Iterations",
                      gridcolor = 'grey',
                      showgrid = TRUE,
                      showline = TRUE,
                      showticklabels = TRUE,
                      tickcolor = 'white',
                      ticks = 'outside',
                      zeroline = FALSE),
         yaxis = list(title = "Loss",
                      gridcolor = 'grey',
                      showgrid = TRUE,
                      showline = TRUE,
                      showticklabels = TRUE,
                      tickcolor = 'white',
                      ticks = 'outside',
                      zeroline = FALSE))

Time for this code chunk to run: 4.4 seconds

III. The Effect of Gradient Descent

Learning Rate: This is the hyperparameter that determines the steps the gradient descent algorithm takes. Gradient Descent is too sensitive to the learning rate. If it is too big, the algorithm may bypass the local minimum and overshoot. If it too small, it might increase the total computation time to a very large extent. We will see the effect of the learning rate in depth later in the article.

Gradient: Basically, it is a measure of the steepness of a slope. And technically, when we sum up all the first-order derivatives of all the variables in a function, it gives us gradient. For example, if we consider linear regression, we have two parameters, slope, and the intercept, to minimize. So, we calculate derivatives w.r.t. both slope & the intercept and then sum them up to get the gradient for it.

Descent: To optimize parameters, we need to minimize errors. The aim of the gradient descent algorithm is to reach the local minimum (though we always aim to reach the global minimum of the function. But if a gradient descent algorithm once attains the local minimum, it is nearly impossible to reach the global minimum.). The algorithm accomplishes this by an iterative process of calculating step size at every iteration. And, this iterative calculation of step size to reach a local minimum (or in other words, descending to the point of minimum) is known as the descent (Enough of that going down the hill example).

# Use Oranges Data Set
library(RColorBrewer)
data("Orange")

# Determine number of iterations
niter <- 85

# Determine learning rate / step size
alpha <- 1e-4

set.seed(19)
b0 <- rnorm(1) # intercept
b1 <- rnorm(1) # slope

# Set palette
cols <- colorRampPalette(rev(brewer.pal(n = 7, name = "RdBu")))(niter)
cols2 <- colorRampPalette((brewer.pal(n = 7, name = "RdBu")))(niter)

# Plot
 plotly::layout(matrix(c(1,1,2,2,3), nrow = 1, ncol = 5))
plot(age ~ circumference, data = Orange, 
     pch = 16,
     xlab = "Circumference (mm)",
     ylab = "Age (days)",
     col = cols2,
     col.axis = 'gold')


# Perform gradient descent
slopes <- rep(NA, niter)
intercepts <- rep(NA, niter)

for (i in 1:niter) {  
# prediction
  y <- b0 + b1 * Orange$circumference
  
# b0 = b0 - dJ/da * alpha
  b0 <- b0 - sum(y - Orange$age) / nrow(Orange) * alpha
  
# b1 = b1 - dJ/db * alpha
  b1 <- b1 - sum((y - Orange$age) * Orange$circumference) / nrow(Orange) * alpha
  abline(a = b0, b = b1, col = cols[i], lty = 2)
  
# Save estimates over all iterations
  intercepts[i] <- b0
  slopes[i] <- b1
}

title("Regression Fit")


# Cost function contour
allCombs <- expand.grid(
                        b0 = seq(-50, 50, 
                        length.out = 100),
                        b1 = seq(7.5, 8.5, 
                        length.out = 100)
                        )

res <- matrix(NA, 100, 100)
# a by rows, b by cols
for (i in 1:nrow(allCombs))
   {
    y <- allCombs$b0[i] + allCombs$b1[i] * Orange$circumference
    res[i] <- sum((y - Orange$age)**2)/(2*nrow(Orange))
   }

par( bg = "black", fg = "white")



# Plot MSE contour Diagram
contour(
        t(res), 
        xlab = expression(beta[1]),
        ylab = expression(beta[0]), 
        axes = F,
        nlevels = 25,
        col = cols2,
        col.lab = 'beige'
       )
  
axis(1, at = c(0, .5, 1), labels = c(7.5, 8, 8.5), col = 'lightyellow' , col.axis = 'yellow')
axis(2, at = c(0, .5, 1), labels = c(-50, 0, 50) , col = 'lightyellow' , col.axis = 'yellow')

points(
       (slopes - min(allCombs$b1)) / diff(range(allCombs$b1)),
       (intercepts - min(allCombs$b0)) / diff(range(allCombs$b0)),
        pch = 19, 
        cex = 1.5,
        col = cols,
        col.lab = 'beige'
  
)

title("MSE contour")

# Add colorbar
z = matrix(1:niter, nrow = 1)
image(
      1, 
      1:niter, z,
      col = cols, 
      axes = FALSE,
      xlab = "", 
      ylab = ""
      )

title("Iteration No.", col = 'white')
axis(2, at = c(1, seq(5, niter, by = 5)))

Time for this code chunk to run: 0.55 seconds

Visual: As the number of iterations increase from[ BLUE]: 1 |-—-|> 85 :[“Red”] The Model_fit & MSE tightens

IV. Exploratory Data Analysis (EDA) & ERM

We have to be able to communicate what is happening in these NN’s to our customers. Using Exploratory Data Analysis (EDA) visuals (below), we can present what is going on under the hood related to how to a NN distinguishes the Iris Setosa from the other two species based on petal length alone. See the below regresion line & residuals plot.
library(plotly)
library(ggplot2)

# LM Plot 
p <- ggplot(Orange, aes(circumference, age))

# Use jitter to reduce overplotting
p <- p + geom_jitter(position = position_jitter(width = 0.5, height = 0.5)) + 
  geom_density_2d(aes(color = ..level..)) + 
  scale_color_viridis_c() + 
  geom_smooth(method = 'lm', formula =  y~x)



# Residuald Plot
library(broom)

mod <- lm(Orange$age ~ Orange$circumference )
df <- augment(mod)
g <- ggplot(df, aes(x = .fitted, y = .resid)) 

# Use jitter to reduce overplotting
g <- g + geom_jitter(position = position_jitter(width = 0.5, height = 0.5)) + 
     geom_density_2d(aes(color = ..level..)) + 
     scale_color_viridis_c() + 
     geom_smooth(method = 'lm', formula =  y~x)

par(c(1,2))
## NULL
ggplotly(p)
ggplotly(g)

Time for this code chunk to run: 2.43 seconds

Principal Component Analysis Interactive Plot

The EDA could include the Principal component analysis (PCA), which is a projection of the 4-dimensional iris flowering data on 2-dimensional space using the first two principal components. We can see that the first principal component alone is useful in distinguishing the three species.

  • We could use simple rules like this: If PC1 < -1, then Iris Setosa.
  • If PC1 > 1.5 then Iris Virginica.
  • If -1 < PC1 < 1, then Iris Versicolor.

It is these patterns & others which we are leveraging in NN’s.

library(plotly)

data(iris)

axis = list(showline = FALSE,
            zeroline = FALSE,
            gridcolor = '#ffff',
            ticklen = 4,
            titlefont = list(size = 13))


fig  <- iris %>%
  plot_ly()

fig <- fig %>%
  add_trace(
            type = 'splom',
            dimensions = list(
           list(label = 'sepal length', values = ~Sepal.Length),
           list(label = 'sepal width', values = ~Sepal.Width),
           list(label = 'petal length', values = ~Petal.Length),
           list(label = 'petal width', values = ~Petal.Width)
           ),
 
           color = ~ Species, 
           colors = c('#636EFA','#EF553B','#00CC96') ,
           marker = list(size = 7,
                         line = list(
                                     width = 1,
                                     color = 'rgb(230,230,230)'
                                    )
                        )
          )    

fig <-  fig %>% plotly::style(diagonal = list(visible = FALSE))

fig <- fig %>%
plotly::layout(
                hovermode = 'closest',
                dragmode = 'select',
                plot_bgcolor = "lightgrey",
                xaxis = list(
                             domain = NULL, 
                             showline = F, 
                             zeroline = F,
                             gridcolor = '#ffff', 
                             ticklen = 4
                            ),
                yaxis = list(
                             domain = NULL, 
                             showline = F,
                             zeroline = F, 
                             gridcolor = '#ffff', 
                             ticklen = 4
                            ),
                xaxis2 = axis,
                xaxis3 = axis,
                xaxis4 = axis,
                yaxis2 = axis,
                yaxis3 = axis,
                yaxis4 = axis 
             )

# Create scatterplots all pairwise combinations for the 4 variables 

fig
pca <- prcomp(iris[, 1:4], scale = TRUE) 
pca 
## Standard deviations (1, .., p=4):
## [1] 1.7083611 0.9560494 0.3830886 0.1439265
## 
## Rotation (n x k) = (4 x 4):
##                     PC1         PC2        PC3        PC4
## Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
## Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971
# Have a look at the results.  
# extract first two columns and convert to data frame 

pcaData <- as.data.frame(pca$x[, 1:2]) 
pcaData <- cbind(pcaData, iris$Species) 
colnames(pcaData) <- c("PC1", "PC2", "Species")   

# compute % variances 

percentVar <- base::round(100 * summary(pca)$importance[2, 1:2], 0)  


# starting ggplot2  add data points, x label, y label  

library(ggplot2)

q <-  ggplot(pcaData, 
             aes(PC1, 
                 PC2, 
                 color = Species, 
                 shape = Species)) +  
                 geom_point(size = 2) + 
                 xlab(paste0("PC1: ", percentVar[1], "% variance")) +
                 ylab(paste0("PC2: ", percentVar[2], "% variance")) +                  ggtitle("Principal component analysis (PCA)") 

ggplotly(q)

Time for this code chunk to run: 0.55 seconds

We can gain many insights from EDA. Interactive analytics allows for a better experience for the customer & as well as the provider. It promotes engagement, & creates data opportunities that a static experience may miss on. For example take the visual below.

  • The 150 flowers in the rows are organized into different clusters.

  • Setosa samples obviously formed a unique cluster, characterized by smaller (blue) petal length, petal width, and sepal length.

  • The other two subspecies are not clearly separated but we can notice that some I.

  • Virginica samples form a small sub cluster showing bigger petals.

  • The columns are also organized into dendrograms, which clearly suggest that petal length and petal width are highly correlated. Thus we can communicate partly how these attributes are considered through our NN model weights, & what factors contribute to their magnitudes (strength) & charge,( i.e positive / negative ).

  • Heat Map - The following code will build the tree divided heat map with the Iris data set.

    library(pheatmap) ma <- as.matrix(iris[, 1:4])

    # assign row names in the matrix convert to matrix
    row.names(ma) <- row.names(iris)
     par( bg = “black”, fg = “white”)

    pheatmap_ <- pheatmap(ma, scale = “column”,
    clustering_method = “average”,
    annotation_row = iris[, 5, drop = FALSE],
    show_rownames = FALSE,
    width = 7,
    height = 5,)

_____________________________________________________________________________________________________

Empirical Risk Minimization ERM

In general, the risk 𝑅(ℎ) cannot be computed because the distribution 𝑃(𝑥,𝑦) is unknown to the learning algorithm (this situation is referred to as agnostic learning). However, given a sample of independent and identically distributed (iid training data points, we can compute an estimate, called the empirical risk, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the empirical measure:

𝑅emp(ℎ) = 1𝑛∑𝑖 =1𝑛 𝐿( ℎ(𝑥𝑖), 𝑦𝑖) {\displaystyle \!R_{\text{emp}}(h)

The empirical risk minimization principle[1] states that the learning algorithm should choose a hypothesis ℎ^ which minimizes the empirical risk over the hypothesis class 𝐻:

ℎ^ = arg min ℎ∈𝐻 𝑅 emp(ℎ) {\displaystyle {\hat {h}}

Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above optimization problem.

activation functions. Advanced features such as adaptive learning rate, rate annealing, momentum training, dropout, L1 or L2 regularization, checkpoints, and grid search enable high predictive accuracy. Each compute node trains a copy of the global model parameters on its local data with multi-threading (asynchronously) and contributes periodically to the global model via model averaging across the network.

V. Sparkling Water / h2o ai


Above; H2O Flow is an open-source user interface for H2O. It is a web-based interactive environment that allows you to combine code execution, text, mathematics, plots, and rich media in a single document.

Their are 3 ways to h2o: i. - Local ii. - System iii. - Spark Based,

I am connecting through Spark, in what they call sparkling water. (see below)

  • User Memory = (Heap Size-300MB)*(1-spark.memory.fraction) where 300MB stands for reserved memory and spark.memory.fraction property is 0.6 by default.

  • Execution memory = Usable Memory * spark.memory.fraction*(1-spark.memory.storageFraction)

  • Storage memory = Usable Memory * spark.memory.fraction*spark.memory.storageFraction

  • total_executor_memory = (total_ram_per_node -1) / executor_per_node

  • total_executor_memory = (64–1)/3 = 21(rounded down)

  • spark.executor.memory = total_executor_memory * 0.9

  • spark.executor.memory = 21*0.9 = 18 (rounded down)

  • memory_overhead = 21*0.1 = 3 (rounded up)

  • spark.executor.instances: Number of executors for the spark application.

  • spark.executor.memory: Amount of memory to use for each executor that runs the task.

  • spark.executor.cores: Number of concurrent tasks an executor can run.

  • spark.driver.memory: Amount of memory to use for the driver.

  • spark.driver.cores: Number of virtual cores to use for the driver process.

  • spark.sql.shuffle.partitions: Number of partitions to use when shuffling data for joins or aggregations.

  • spark.default.parallelism: Default number of partitions in resilient distributed data sets (RDDs) returned by transformations like join and aggregations.

  • spark.driver.memory can be set as the same as spark.executor.memory,

  • spark.driver.cores is set as the same as spark.executors.cores.

  • spark.default.parallelism = spark.executor.instances * spark.executor.cores * 2

  • spark.default.parallelism = 8 * 5 * 2 = 80

  • Spark Context is the main entry point into Spark functionality. executors send regular heartbeat messages

  • Install sparkling water : install.packages(“C:/Users/Administrator/AppData/Local/spark/spark-3.5.0-bin-hadoop3/sparkling-water-3.46.0.3-1-3.5/rsparkling_3.46.0.3-1-3.5.tar.gz”)

  • Install sparkling water - in project : renv::install(‘C:/Users/Administrator/AppData/Local/spark/spark-3.5.0-bin-hadoop3/sparkling-water-3.46.0.3-1-3.5/rsparkling_3.46.0.3-1-3.5.tar.gz’)

    Initialization

- Spark\ h2o ai Initialization
library(sparklyr);library(rsparkling);library(h2o);
library(h2otools);library(dplyr)

# SPARK_RAPIDS_PLUGIN_JAR = 'C:/Users/Administrator/AppData/Local/spark/spark-3.5.0-bin-hadoop3/GPU/rapids-4-spark_2.12-24.06.0.jar'
# 
# SPARK_SQL_PLUGIN_JAR = 'C:/Users/Administrator/AppData/Local/spark/spark-3.5.0-bin-hadoop3/GPU/cudf-24.06.0.jar'

#Pipe:data[config]==>{S_data[config.default]} Initialize w/ defaults 

config <- spark_config() 
options(sparklyr.log.console = TRUE)

#Pipe:data[config]==>{S_data[config.user]} Initialize w/ user.settings

config["sparklyr.shell.driver-memory"] <- "32g"        
config["sparklyr.connect.cores.local"] <- 32        
config["sparklyr.connect.timeout"] <- 200            
config["sparklyr.log.console"] <- TRUE
config["spark.driver.extraJavaOptions"] <- "-Duser.timezone = UTC"
config["spark.executor.memory"] <- '32G' 
config["spark.executor.memoryOverhead"] <- '2G'
config["spark.executor.resource.gpu.amount"] <- 1
config["spark.sql.session.timeZone"] <- 'UTC'
config["spark.sql.files.maxPartitionBytes"] <- '512m'

#config["spark.executor.resource.gpu.amount"] <- 1
#config["spark.task.resource.gpu.amount"] <- .0125
#config["spark.task.cpus"] <- 1
#config["spark.rapids.sql.enabled"] <- TRUE
#config["spark.rapids.memory.gpu.maxAllocFraction"] <- .8
#config["spark.jars"] = 'SPARK_SQL_PLUGIN_JAR,SPARK_RAPIDS_PLUGIN_JAR'
#config["spark.rapids.sql.concurrentGpuTasks"] <- 2
#config["spark.executor.resource.gpu.discoveryScript"] <- GET_GPU
#config["spark.sql.shuffle.partitions.local"] <- 8
#config["spark.plugins"] <- 'com.nvidia.spark.SQLPlugin'
#coonnect to local cluster with custom configuration

sc <- sparklyr::spark_connect(master = "local", version = "3.5.0", config = config )

#Pipe:data[config]==>{S_data[config]}===>{S_data[config].h2o}
h2oConf <- H2OConf()
config["h2oConf.setNthreads"] <- 40
hc <- H2OContext.getOrCreate(h2oConf)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         11 seconds 995 milliseconds 
##     H2O cluster timezone:       America/Chicago 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.46.0.3 
##     H2O cluster version age:    1 month 
##     H2O cluster name:           sparkling-water-Administrator_local-1720774146110 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   32.00 GB 
##     H2O cluster total cores:    48 
##     H2O cluster allowed cores:  48 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          127.0.0.1 
##     H2O Connection port:        54324 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 4.4.0 (2024-04-24 ucrt) 
## 
## Reference class object of class "H2OContext"
## Field "jhc":
## <jobj[22]>
##   ai.h2o.sparkling.H2OContext
##   
## Sparkling Water Context:
##  * Sparkling Water Version: 3.46.0.3-1-3.5
##  * H2O name: sparkling-water-Administrator_local-1720774146110
##  * cluster size: 1
##  * list of used nodes:
##   (executorId, host, port)
##   ------------------------
##   (0,192.168.1.19,54321)
##   ------------------------
## 
##   Open H2O Flow in browser: http://127.0.0.1:54324 (CMD + click in Mac OSX)
## 
## 
h2o.networkTest()
## Network Test: Launched from C-Cluster/192.168.1.19:54321
##                         destination                 1_bytes          1024_bytes
## 1     all - collective bcast/reduce   2.512 msec,  796  B/S  97 usec, 20.1 MB/S
## 2 self C-Cluster/192.168.1.19:54321    1.128 msec, 1.7 KB/S  504 usec, 3.9 MB/S
##          1048576_bytes
## 1  82 usec, 23.57 GB/S
## 2  598 usec, 3.26 GB/S

Time for this code chunk to run: 25.4 seconds

Spark UI w/ Sparkling Water tab upon successful connection
Spark UI w/ Sparkling Water tab upon successful connection

Sparkling Water allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Spark. With Sparkling Water, users can drive computation from Scala/R/Python and utilize the H2O Flow UI, providing an ideal machine learning platform for application developers.

Spark is an elegant and powerful general-purpose, open-source, in-memory platform with tremendous momentum. H2O is an in-memory application for machine learning that is reshaping how people apply math and predictive analytics to their business problems.

So lets dance now, we’ll build models, predict, & evaluate different classes of models using ML.

- RSparkling
# Intake ==> ETL ==> [Connections Sockets]

h2o.networkTest()
## Network Test: Launched from C-Cluster/192.168.1.19:54321
##                         destination                 1_bytes          1024_bytes
## 1     all - collective bcast/reduce     157 usec, 12.4 KB/S  854 usec, 2.3 MB/S
## 2 self C-Cluster/192.168.1.19:54321   4.198 msec,  476  B/S 189 usec, 10.3 MB/S
##              1048576_bytes
## 1   2.678 msec, 746.6 MB/S
## 2      649 usec, 3.01 GB/S
iris
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica
# Intake ==> ETL ==> [Training Data_idx]

index <- c(base::sample(1:50,25),
           base::sample(51:100,25), 
           base::sample(101:150,25))

# Pipe: data[iris]===> Spark_Neural_Net_Data-Model_Structure{S_NN_DM}

iris_Train = iris[index,]
iris_Test = iris[-index,]

# Pipe: data[iris]===> {S_Train.Test}===>{h2o_S_Train.Test}

iris.h2o_Train <- as.h2o(iris_Train)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
iris.h2o_Test <- as.h2o(iris_Test)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
# Pipe:data[h2o_.]==>{h2o_S_NN_DM}(epochs:100 ==> 5_nodes_hidden_L1) 

iris_nn_Train <- h2o.deeplearning(x = 1:4 ,y = 5, 
                            training_frame = iris.h2o_Train, 
                            validation_frame = iris.h2o_Test,
                            activation = "Tanh",
                            hidden = c(5), 
                            l1 = 1e-5,
                            epochs = 100, 
                            variable_importances = TRUE)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
# Pipe:data[h2o_.]==>{h2o_S_NN_DM}(epochs:100 ==> 5_nodes_hidden_L1) 

iris_nn_Train.cv <- h2o.deeplearning(x = 1:4 ,
                            y = 5, 
                            training_frame = iris.h2o_Train, 
                            validation_frame = iris.h2o_Test,
                            activation = "Tanh",
                            hidden = c(5), 
                            l1 = 1e-5,
                            nfolds = 5,
                            epochs = 100) 
##   |                                                                              |                                                                      |   0%  |                                                                              |===========================================================           |  84%  |                                                                              |======================================================================| 100%
# [Spark_Out]==>{S_tanh[data]} 

plot(x, tanh(x), 
     col = 'lightgreen',
     main = 'Loss Fn Tanh'
     )

# Performance Code ==>Pipe: F{S_NN_DM}[Data]

h2o.performance(iris_nn_Train, train = TRUE)
## H2OMultinomialMetrics: deeplearning
## ** Reported on training data. **
## ** Metrics reported on full training frame **
## 
## Training Set Metrics: 
## =====================
## 
## Extract training frame with `h2o.getFrame("iris_Train_sid_81c2_1")`
## MSE: (Extract with `h2o.mse`) 0.02503523
## RMSE: (Extract with `h2o.rmse`) 0.1582252
## Logloss: (Extract with `h2o.logloss`) 0.1100941
## Mean Per-Class Error: 0.04
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## AIC: (Extract with `h2o.aic`) NaN
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##            setosa versicolor virginica  Error     Rate
## setosa         25          0         0 0.0000 = 0 / 25
## versicolor      0         23         2 0.0800 = 2 / 25
## virginica       0          1        24 0.0400 = 1 / 25
## Totals         25         24        26 0.0400 = 3 / 75
## 
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
## =======================================================================
## Top-3 Hit Ratios: 
##   k hit_ratio
## 1 1  0.960000
## 2 2  1.000000
## 3 3  1.000000
# Performance Code F{S_NN_DM}(.mse)

h2o.mse(iris_nn_Train, train = TRUE)
## [1] 0.02503523
h2o.mse(iris_nn_Train.cv, xval = TRUE)
## [1] 0.03256607
# perfromance Code F{S_NN_DM}(.varimp)

h2o.varimp(iris_nn_Train)
## Variable Importances: 
##       variable relative_importance scaled_importance percentage
## 1 Petal.Length            1.000000          1.000000   0.401914
## 2  Petal.Width            0.668017          0.668017   0.268485
## 3  Sepal.Width            0.491290          0.491290   0.197456
## 4 Sepal.Length            0.328788          0.328788   0.132144
# Predict Code F{S_NN_DM}(.predict)

predictions_Train <- h2o.predict(iris_nn_Train, iris.h2o_Train)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
predictions_Train
##   predict    setosa  versicolor    virginica
## 1  setosa 0.9945152 0.005462464 2.231943e-05
## 2  setosa 0.9887402 0.011227083 3.269255e-05
## 3  setosa 0.9949694 0.005016568 1.407947e-05
## 4  setosa 0.9957555 0.004230833 1.364526e-05
## 5  setosa 0.9950313 0.004951623 1.709626e-05
## 6  setosa 0.9936373 0.006338289 2.445173e-05
## 
## [75 rows x 4 columns]
predictions_Test <- h2o.predict(iris_nn_Train, iris.h2o_Test)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
predictions_Test 
##   predict    setosa  versicolor    virginica
## 1  setosa 0.9867850 0.013178777 3.618120e-05
## 2  setosa 0.9931986 0.006781547 1.986322e-05
## 3  setosa 0.9946448 0.005335784 1.946356e-05
## 4  setosa 0.9929678 0.007011573 2.065007e-05
## 5  setosa 0.9935740 0.006407667 1.833944e-05
## 6  setosa 0.9960492 0.003935277 1.551795e-05
## 
## [75 rows x 4 columns]
Y_hat_Train = as.factor(as.matrix(predictions_Train$predict))
Y_hat_Test = as.factor(as.matrix(predictions_Test$predict))


# Pipe: data[h2o_.]===> Spark_GBM_Data-Model_Structure{S_GBM_DM}

iris_hex <- as.h2o(iris)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
iris_gbm <- h2o.gbm(x = c(1:4), 
                    y = 5, 
                    training_frame = iris.h2o_Train)
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Time for this code chunk to run: 11.07 seconds

Take note of the scaled, variable, & relative importance, as well as the percentages applied to each variable. Let’s compare our models:

#DEEP LEARNING MODEL


# Create the learning curve plot 
#[Spark_Out]==>{.learning_curve_plot[h2o_.]} 

learning_curve <- h2o.learning_curve_plot(iris_nn_Train)


# Partial Dependency Graph 3 target classes 
#[Spark_Out]==>{.partialPlot[h2o_.]} 

suppressWarnings(h2o.partialPlot(object = iris_nn_Train, 
                newdata = iris_hex, 
                cols = "Petal.Width",
                targets = c("setosa", "virginica", "versicolor")))
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## [[1]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.367607        0.446107                0.036424
## 2     0.226316      0.360533        0.449881                0.036733
## 3     0.352632      0.354414        0.453039                0.036990
## 4     0.478947      0.349208        0.455554                0.037196
## 5     0.605263      0.344817        0.457431                0.037349
## 6     0.731579      0.341109        0.458708                0.037453
## 7     0.857895      0.337941        0.459442                0.037513
## 8     0.984211      0.335172        0.459695                0.037534
## 9     1.110526      0.332665        0.459518                0.037519
## 10    1.236842      0.330291        0.458922                0.037471
## 11    1.363158      0.327922        0.457864                0.037384
## 12    1.489474      0.325420        0.456237                0.037252
## 13    1.615789      0.322624        0.453888                0.037060
## 14    1.742105      0.319344        0.450634                0.036794
## 15    1.868421      0.315365        0.446281                0.036439
## 16    1.994737      0.310458        0.440647                0.035979
## 17    2.121053      0.304403        0.433575                0.035401
## 18    2.247368      0.297008        0.424944                0.034697
## 19    2.373684      0.288121        0.414659                0.033857
## 20    2.500000      0.277629        0.402625                0.032874
## 
## [[2]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.021313        0.052868                0.004317
## 2     0.226316      0.029265        0.071026                0.005799
## 3     0.352632      0.039723        0.091920                0.007505
## 4     0.478947      0.053171        0.115398                0.009422
## 5     0.605263      0.070057        0.141265                0.011534
## 6     0.731579      0.090717        0.169087                0.013806
## 7     0.857895      0.115317        0.198144                0.016178
## 8     0.984211      0.143853        0.227425                0.018569
## 9     1.110526      0.176229        0.255777                0.020884
## 10    1.236842      0.212325        0.282247                0.023045
## 11    1.363158      0.251941        0.306375                0.025015
## 12    1.489474      0.294605        0.328161                0.026794
## 13    1.615789      0.339391        0.347801                0.028398
## 14    1.742105      0.384905        0.365461                0.029840
## 15    1.868421      0.429483        0.381183                0.031123
## 16    1.994737      0.471523        0.394891                0.032243
## 17    2.121053      0.509790        0.406560                0.033195
## 18    2.247368      0.543530        0.416352                0.033995
## 19    2.373684      0.572455        0.424458                0.034657
## 20    2.500000      0.596743        0.430829                0.035177
## 
## [[3]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.611080        0.434954                0.035514
## 2     0.226316      0.610202        0.435208                0.035535
## 3     0.352632      0.605863        0.434131                0.035447
## 4     0.478947      0.597621        0.431738                0.035251
## 5     0.605263      0.585127        0.428097                0.034954
## 6     0.731579      0.568174        0.423274                0.034560
## 7     0.857895      0.546742        0.417274                0.034070
## 8     0.984211      0.520975        0.409900                0.033468
## 9     1.110526      0.491107        0.400615                0.032710
## 10    1.236842      0.457384        0.388592                0.031728
## 11    1.363158      0.420137        0.373003                0.030456
## 12    1.489474      0.379975        0.353352                0.028851
## 13    1.615789      0.337985        0.329634                0.026915
## 14    1.742105      0.295751        0.302355                0.024687
## 15    1.868421      0.255152        0.272438                0.022245
## 16    1.994737      0.218019        0.241075                0.019684
## 17    2.121053      0.185807        0.209812                0.017131
## 18    2.247368      0.159462        0.181007                0.014779
## 19    2.373684      0.139424        0.158010                0.012901
## 20    2.500000      0.125628        0.144303                0.011782
# Variable Importance Plot 
#[Spark_Out]==>{.permutation_importance_plot[h2o_.]}

suppressWarnings(h2o.permutation_importance_plot(iris_nn_Train, iris.h2o_Train))

# Learning Curve Graph 
# [Spark_Out]==>{(.print)[.learning_curve_plot[h2o_.]

suppressWarnings(print(learning_curve))

Time for this code chunk to run: 1.27 seconds

Learning curves show error metric dependence on learning progress, e.g., RMSE vs number of trees trained so far in GBM. There can be up to 4 curves showing Training, Validation, Training on CV Models, and Cross-validation error.

Partial dependence plot gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. Note: Unlike random Forest’s partial Plot when plotting partial dependence the mean response (probabilities) is returned rather than the mean of the log class probability.

#GBM MODEL

# Create the learning curve plot 
#[Spark_Out]==>{.learning_curve_plot[h2o_.]} 

learning_curve <- h2o.learning_curve_plot(iris_gbm)

suppressWarnings(print(learning_curve))

# Partial Dependency Graph 3 target classes 
#[Spark_Out]==>{.partialPlot[h2o_.]} 

suppressWarnings(h2o.partialPlot(object = iris_gbm, 
                newdata = iris_hex, 
                cols = "Petal.Length", 
                targets = c("setosa")))
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## PartialDependence: Partial dependency plot for Petal.Length and class setosa
##    Petal.Length mean_response stddev_response std_error_mean_response
## 1      1.000000      0.874189        0.173956                0.014203
## 2      1.310526      0.874189        0.173956                0.014203
## 3      1.621053      0.874189        0.173956                0.014203
## 4      1.931579      0.874189        0.173956                0.014203
## 5      2.242105      0.874189        0.173956                0.014203
## 6      2.552632      0.874189        0.173956                0.014203
## 7      2.863158      0.178106        0.200938                0.016407
## 8      3.173684      0.090476        0.097154                0.007933
## 9      3.484211      0.090476        0.097154                0.007933
## 10     3.794737      0.090476        0.097154                0.007933
## 11     4.105263      0.069856        0.069049                0.005638
## 12     4.415789      0.069288        0.068693                0.005609
## 13     4.726316      0.065703        0.062703                0.005120
## 14     5.036842      0.054115        0.073618                0.006011
## 15     5.347368      0.054358        0.073928                0.006036
## 16     5.657895      0.047911        0.065060                0.005312
## 17     5.968421      0.040208        0.054391                0.004441
## 18     6.278947      0.040208        0.054391                0.004441
## 19     6.589474      0.040208        0.054391                0.004441
## 20     6.900000      0.040208        0.054391                0.004441
suppressWarnings(h2o.partialPlot(object = iris_gbm, 
                newdata = iris_hex, 
                cols = "Petal.Width",
                targets = c("setosa", "virginica", "versicolor")))
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## [[1]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.379974        0.441103                0.036016
## 2     0.226316      0.379974        0.441103                0.036016
## 3     0.352632      0.378724        0.441707                0.036065
## 4     0.478947      0.378724        0.441707                0.036065
## 5     0.605263      0.378724        0.441707                0.036065
## 6     0.731579      0.378724        0.441707                0.036065
## 7     0.857895      0.308652        0.436679                0.035655
## 8     0.984211      0.298318        0.422488                0.034496
## 9     1.110526      0.298318        0.422488                0.034496
## 10    1.236842      0.298625        0.422901                0.034530
## 11    1.363158      0.298625        0.422901                0.034530
## 12    1.489474      0.298625        0.422901                0.034530
## 13    1.615789      0.298625        0.422901                0.034530
## 14    1.742105      0.337349        0.436638                0.035651
## 15    1.868421      0.337317        0.436662                0.035653
## 16    1.994737      0.337320        0.436660                0.035653
## 17    2.121053      0.337149        0.436786                0.035663
## 18    2.247368      0.337149        0.436786                0.035663
## 19    2.373684      0.337149        0.436786                0.035663
## 20    2.500000      0.337149        0.436786                0.035663
## 
## [[2]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.273976        0.408173                0.033327
## 2     0.226316      0.273976        0.408173                0.033327
## 3     0.352632      0.273247        0.407347                0.033260
## 4     0.478947      0.273247        0.407347                0.033260
## 5     0.605263      0.273247        0.407347                0.033260
## 6     0.731579      0.273247        0.407347                0.033260
## 7     0.857895      0.267080        0.407082                0.033238
## 8     0.984211      0.261006        0.401017                0.032743
## 9     1.110526      0.261006        0.401017                0.032743
## 10    1.236842      0.259442        0.400870                0.032731
## 11    1.363158      0.259442        0.400870                0.032731
## 12    1.489474      0.259442        0.400870                0.032731
## 13    1.615789      0.259442        0.400869                0.032731
## 14    1.742105      0.420001        0.420939                0.034370
## 15    1.868421      0.420199        0.421206                0.034391
## 16    1.994737      0.420198        0.421206                0.034391
## 17    2.121053      0.419831        0.422100                0.034464
## 18    2.247368      0.419831        0.422100                0.034464
## 19    2.373684      0.419831        0.422100                0.034464
## 20    2.500000      0.419831        0.422100                0.034464
## 
## [[3]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.346050        0.415938                0.033961
## 2     0.226316      0.346050        0.415938                0.033961
## 3     0.352632      0.348028        0.417041                0.034051
## 4     0.478947      0.348028        0.417041                0.034051
## 5     0.605263      0.348028        0.417041                0.034051
## 6     0.731579      0.348028        0.417041                0.034051
## 7     0.857895      0.424268        0.444052                0.036257
## 8     0.984211      0.440676        0.434542                0.035480
## 9     1.110526      0.440676        0.434542                0.035480
## 10    1.236842      0.441933        0.435035                0.035520
## 11    1.363158      0.441933        0.435035                0.035520
## 12    1.489474      0.441933        0.435035                0.035520
## 13    1.615789      0.441933        0.435035                0.035520
## 14    1.742105      0.242650        0.332170                0.027122
## 15    1.868421      0.242484        0.332286                0.027131
## 16    1.994737      0.242482        0.332284                0.027131
## 17    2.121053      0.243019        0.333555                0.027235
## 18    2.247368      0.243019        0.333555                0.027235
## 19    2.373684      0.243019        0.333555                0.027235
## 20    2.500000      0.243019        0.333555                0.027235
# Variable Importance Plot 
#[Spark_Out]==>{.permutation_importance_plot[h2o_.]}

suppressWarnings(h2o.permutation_importance_plot(iris_gbm, iris.h2o_Train))

Time for this code chunk to run: 1.29 seconds


VI. Optimizing Hyper-Parameters

A model hyperparameter is a characteristic of a model that is external to the model and whose value cannot be estimated from data. The value of the hyperparameter has to be set before the learning process begins. For example, c in Support Vector Machines, k in k-Nearest Neighbors, the number of hidden layers in Neural Networks.

  • Random Search. Define a search space as a bounded domain of hyperparameter values and randomly sample points in that domain.
  • Grid Search. Define a search space as a grid of hyperparameter values and evaluate every position in the grid.

In the following example we will manually build out an example grid_search to optimize our hyperparameters.

#Deep Learning Grid Search 

hc <- H2OContext.getOrCreate(h2oConf)

# Pipe:Create:data[h2o_.]==>{h2o_S_List}(Grid:.h2o.hidden ==>  hyper_params:.h2o) 

hidden_opt <- list(c(1), c(2), c(3), 4,5,6,7,8,9,10, c(3,4),c(4,4), c(5,4), c(6,4))

hyper_params <- list(hidden = hidden_opt)

# data in H2O format
# Pipe:data[h2o_.]==>{h2o_S_Deep_LNN_DM}(Grid:.h2o ==>  hyper_params:.h2o) 

model_grid <- h2o.grid(
                        "deeplearning",
                        
                        hyper_params = hyper_params,
                        x = 1:4 ,
                        y = 5, 
                        training_frame = iris.h2o_Train, 
                        validation_frame = iris.h2o_Test,
                        activation = "Tanh",
                        seed = 19, 
                        reproducible = TRUE, 
                        nfolds = 5
                        )
##   |                                                                              |                                                                      |   0%Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix,  : 
##   Unexpected CURL error: Failed to connect to 127.0.0.1 port 54324 after 0 ms: Couldn't connect to server
## [1] "Job request failed Unexpected CURL error: Failed to connect to 127.0.0.1 port 54324 after 0 ms: Couldn't connect to server, will retry after 3s."
##   |                                                                              |======================================================================| 100%
model_grid
## H2O Grid Details
## ================
## 
## Grid ID: Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4 
## Used hyper parameters: 
##   -  hidden 
## Number of models: 14 
## Number of failed models: 0 
## 
## Hyper-Parameter Search Summary: ordered by increasing logloss
##    hidden
## 1      10
## 2       8
## 3       7
## 4  [5, 4]
## 5  [4, 4]
## 6  [6, 4]
## 7       6
## 8       9
## 9  [3, 4]
## 10      5
## 11      3
## 12      4
## 13      2
## 14      1
##                                                                   model_ids
## 1  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_10
## 2   Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_8
## 3   Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_7
## 4  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_13
## 5  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_12
## 6  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_14
## 7   Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_6
## 8   Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_9
## 9  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_11
## 10  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_5
## 11  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_3
## 12  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_4
## 13  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_2
## 14  Grid_DeepLearning_iris_Train_sid_81c2_1_model_R_1720774150565_4_model_1
##    logloss
## 1  0.43372
## 2  0.48605
## 3  0.49865
## 4  0.52968
## 5  0.57270
## 6  0.58271
## 7  0.59055
## 8  0.61162
## 9  0.70005
## 10 0.78003
## 11 0.93817
## 12 1.00948
## 13 1.47813
## 14 3.25693
Best_Grid_Model <- h2o.getModel(model_grid@model_ids[[1]])

# Performance Code F{S_NN_DM}(.mse)

h2o.mse(Best_Grid_Model, xval = TRUE)
## [1] 0.1424292
h2o.performance(model = Best_Grid_Model , newdata = iris.h2o_Test)
## H2OMultinomialMetrics: deeplearning
## 
## Test Set Metrics: 
## =====================
## 
## MSE: (Extract with `h2o.mse`) 0.09396421
## RMSE: (Extract with `h2o.rmse`) 0.3065358
## Logloss: (Extract with `h2o.logloss`) 0.3041457
## Mean Per-Class Error: 0.1466667
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## AIC: (Extract with `h2o.aic`) NaN
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>, <data>)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##            setosa versicolor virginica  Error      Rate
## setosa         25          0         0 0.0000 =  0 / 25
## versicolor      0         19         6 0.2400 =  6 / 25
## virginica       0          5        20 0.2000 =  5 / 25
## Totals         25         24        26 0.1467 = 11 / 75
## 
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>, <data>)`
## =======================================================================
## Top-3 Hit Ratios: 
##   k hit_ratio
## 1 1  0.853333
## 2 2  1.000000
## 3 3  1.000000
# Variable Importance Plot 
#[Spark_Out]==>{.permutation_importance_plot[h2o_.]}

varimp <- h2o.varimp(Best_Grid_Model)
varimp
## Variable Importances: 
##       variable relative_importance scaled_importance percentage
## 1 Petal.Length            1.000000          1.000000   0.311023
## 2  Sepal.Width            0.922365          0.922365   0.286877
## 3  Petal.Width            0.800324          0.800324   0.248920
## 4 Sepal.Length            0.492504          0.492504   0.153180

Time for this code chunk to run: 162.64 seconds

Above: We created 14 different model configurations with various network architectures, with model_14 outperforming all other models in terms of optimizing log loss, while also analyzing variable importance. Lets look & compare this version with our previous versions using the same visuals.

hc <- H2OContext.getOrCreate(h2oConf)

# Variable Importance Plot 
#[Spark_Out]==>{.permutation_importance_plot[.Grid.h2o_.]}

varimp <- h2o.varimp(Best_Grid_Model)
varimp
## Variable Importances: 
##       variable relative_importance scaled_importance percentage
## 1 Petal.Length            1.000000          1.000000   0.311023
## 2  Sepal.Width            0.922365          0.922365   0.286877
## 3  Petal.Width            0.800324          0.800324   0.248920
## 4 Sepal.Length            0.492504          0.492504   0.153180
# Create the learning curve plot 
#[Spark_Out]==>{.learning_curve_plot[h2o_.]} 

learning_curve <- h2o.learning_curve_plot(Best_Grid_Model)
suppressWarnings(print(learning_curve))

# Variable Importance Plot 
#[Spark_Out]==>{.permutation_importance_plot[h2o_.]}

suppressWarnings(h2o.varimp_plot(Best_Grid_Model))

# Partial Dependency Graph 3 target classes 
#[Spark_Out]==>{.partialPlot[h2o_.]} 

suppressWarnings(h2o.partialPlot(object = Best_Grid_Model, 
                newdata = iris_hex, 
                cols = "Petal.Width",
                targets = c("setosa", "virginica", "versicolor")))
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## [[1]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.725999        0.272394                0.022241
## 2     0.226316      0.675680        0.293207                0.023940
## 3     0.352632      0.623271        0.311085                0.025400
## 4     0.478947      0.570259        0.324833                0.026523
## 5     0.605263      0.517296        0.332802                0.027173
## 6     0.731579      0.463131        0.332564                0.027154
## 7     0.857895      0.402971        0.319441                0.026082
## 8     0.984211      0.328495        0.286459                0.023389
## 9     1.110526      0.239321        0.234876                0.019178
## 10    1.236842      0.156017        0.182503                0.014901
## 11    1.363158      0.095800        0.137812                0.011252
## 12    1.489474      0.055929        0.097988                0.008001
## 13    1.615789      0.030266        0.064726                0.005285
## 14    1.742105      0.015181        0.039260                0.003206
## 15    1.868421      0.007293        0.021499                0.001755
## 16    1.994737      0.003535        0.011216                0.000916
## 17    2.121053      0.001797        0.006088                0.000497
## 18    2.247368      0.000952        0.003394                0.000277
## 19    2.373684      0.000519        0.001859                0.000152
## 20    2.500000      0.000295        0.001041                0.000085
## 
## [[2]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.000386        0.001124                0.000092
## 2     0.226316      0.000611        0.001518                0.000124
## 3     0.352632      0.001112        0.002771                0.000226
## 4     0.478947      0.002334        0.006144                0.000502
## 5     0.605263      0.005474        0.013676                0.001117
## 6     0.731579      0.013827        0.028903                0.002360
## 7     0.857895      0.035914        0.061275                0.005003
## 8     0.984211      0.087525        0.129468                0.010571
## 9     1.110526      0.177099        0.230964                0.018858
## 10    1.236842      0.286331        0.316897                0.025875
## 11    1.363158      0.402395        0.349497                0.028536
## 12    1.489474      0.533425        0.333501                0.027230
## 13    1.615789      0.672618        0.284730                0.023248
## 14    1.742105      0.794699        0.218665                0.017854
## 15    1.868421      0.881800        0.156098                0.012745
## 16    1.994737      0.934135        0.111287                0.009087
## 17    2.121053      0.962880        0.078757                0.006430
## 18    2.247368      0.978728        0.051255                0.004185
## 19    2.373684      0.987645        0.029718                0.002426
## 20    2.500000      0.992575        0.016298                0.001331
## 
## [[3]]
## PartialDependence: Partial dependency plot for Petal.Width and classes
##  setosa, virginica, versicolor
##    Petal.Width mean_response stddev_response std_error_mean_response
## 1     0.100000      0.273614        0.272453                0.022246
## 2     0.226316      0.323709        0.293359                0.023953
## 3     0.352632      0.375617        0.311486                0.025433
## 4     0.478947      0.427408        0.325928                0.026612
## 5     0.605263      0.477230        0.335859                0.027423
## 6     0.731579      0.523042        0.341205                0.027859
## 7     0.857895      0.561115        0.343597                0.028055
## 8     0.984211      0.583980        0.347875                0.028404
## 9     1.110526      0.583580        0.358951                0.029308
## 10    1.236842      0.557652        0.368064                0.030052
## 11    1.363158      0.501805        0.359335                0.029340
## 12    1.489474      0.410646        0.327874                0.026771
## 13    1.615789      0.297116        0.275835                0.022522
## 14    1.742105      0.190120        0.210700                0.017204
## 15    1.868421      0.110907        0.149684                0.012222
## 16    1.994737      0.062330        0.106146                0.008667
## 17    2.121053      0.035323        0.074958                0.006120
## 18    2.247368      0.020320        0.048823                0.003986
## 19    2.373684      0.011836        0.028338                0.002314
## 20    2.500000      0.007129        0.015525                0.001268

Time for this code chunk to run: 1.06 seconds

We have plenty of data to ensure that we both understand & implement an optimized model. In addition h2o can be used to create dashboards & apps. Auto Doc allows you to create automatic model documentation for your models created in either H2O or Driver less AI

H2O Driver less AI: Like H2O, this tool also offers automatic machine learning but this tool takes it a few steps further. As well as trying different machine learning algorithms (and ensembles of the available algorithms), this tool will also perform automatic feature engineering, produce data visualizations and post training diagnostics plots, and give performance metrics for each model

H2O Auto Doc: Auto Doc allows you to create automatic model documentation for your models created in either H2O or Driver less AI (this feature is integrated into Driver less AI). You can also use this tool on any model you create using the Python library ScikitLearn.

H2O MLOps: If you are looking at putting your machine learning models into production this is where MLOps (machine learning operations) can help. H2O MLOps can be used to deploy models that you have created in both H2O and H2O Driver less AI and allows you to easily maintain them once they are in production. The tool uses Kubernetes for easy deployment, scaling and management 

Tidyverse

Tidyverse

The tidyverse hex logo

Repository github.com/tidyverse/tidyverse
Written in R
Type Package collection
License MIT
Website www.tidyverse.org Edit this at Wikidata
TIDYMODELS GitHub repository https://github.com/moderndive/ModernDive_book
An HTML version of this text can be found at https://moderndive.com/. Free software portal
The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham[1] and his team that “share an underlying design philosophy, grammar, and data structures” of tidy data.[2] Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping.[3][4][5]
The tidyverse package and some of its individual packages comprise some of the most downloaded R packages.[6] The tidyverse is the subject of multiple books and papers.[7][8][9][10] In 2019, the ecosystem has been published in the Journal of Open Source Software.[11]

Here I manually  perform the Grid hyper-parameter optimization process (i.e. tuning), using the tidyverse framework

library(tidymodels)
library(dials)
library(embed)
set.seed(19)

tidymodels_prefer()

# Create Grid Specs for a NN
mlp_spec <- 
  mlp(hidden_units = tune(), penalty = tune(), epochs = tune()) %>% 
  set_engine("nnet", trace = 0) %>% 
  set_mode("classification")

mlp_param <- extract_parameter_set_dials(mlp_spec)
mlp_param %>% extract_parameter_dials("hidden_units")
## # Hidden Units (quantitative)
## Range: [1, 10]
mlp_param %>% extract_parameter_dials("penalty")
## Amount of Regularization (quantitative)
## Transformer: log-10 [1e-100, Inf]
## Range (transformed scale): [-10, 0]
mlp_param %>% extract_parameter_dials("epochs")
## # Epochs (quantitative)
## Range: [10, 1000]
# Create Random Grid - 'size' is the number of combinations
mlp_param %>% 
  grid_random(size = 1000) %>% 
  summary()
##   hidden_units       penalty              epochs      
##  Min.   : 1.000   Min.   :0.0000000   Min.   :  10.0  
##  1st Qu.: 3.000   1st Qu.:0.0000000   1st Qu.: 269.2  
##  Median : 5.000   Median :0.0000056   Median : 486.0  
##  Mean   : 5.399   Mean   :0.0381209   Mean   : 500.7  
##  3rd Qu.: 8.000   3rd Qu.:0.0020968   3rd Qu.: 740.2  
##  Max.   :10.000   Max.   :0.9892722   Max.   :1000.0
library(ggforce)
set.seed(19)

mlp_param %>% 
  grid_latin_hypercube(size = 20, original = FALSE) %>% 
  ggplot(aes(x = .panel_x, y = .panel_y)) + 
  geom_point() +
  geom_blank() +
  facet_matrix(vars(hidden_units, penalty, epochs), layer.diag = 2) + 
  labs(title = "Latin Hypercube design with 20 candidates")

set.seed(19)
iris_folds <- vfold_cv(iris)

iris_model_f <- model.frame(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris)
data.class(iris_model_f)
## [1] "data.frame"
iris_model_matrix <- model.matrix(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris)
data.class(iris_model_matrix)
## [1] "matrix"
# Create a model recipe which accounts for the PCA we look at earlier

iris_glm_rec <- 
  recipe(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris_Train) %>%
  step_pca(all_numeric_predictors(), num_comp = tune()) %>% 
  step_dummy(all_nominal_predictors())

iris_glm_rec

# Create Workflow

mlp_wflow <- 
  workflow() %>% 
  add_model(mlp_spec) %>% 
  add_recipe(iris_glm_rec)

mlp_param <- 
  mlp_wflow %>% 
  extract_parameter_set_dials() %>% 
  update(
    epochs = epochs(c(50, 200)),
    num_comp = num_comp(c(0, 40))
  )

roc_res <- metric_set(roc_auc)
set.seed(1305)
mlp_reg_tune <-
  mlp_wflow %>%
  tune_grid(
    iris_folds,
    grid = mlp_param %>% grid_regular(levels = 3),
    metrics = roc_res
  )

# Tune the  model &  recipe  
mlp_reg_tune
## # Tuning results
## # 10-fold cross-validation 
## # A tibble: 10 × 4
##    splits           id     .metrics          .notes          
##    <list>           <chr>  <list>            <list>          
##  1 <split [135/15]> Fold01 <tibble [81 × 8]> <tibble [0 × 3]>
##  2 <split [135/15]> Fold02 <tibble [81 × 8]> <tibble [0 × 3]>
##  3 <split [135/15]> Fold03 <tibble [81 × 8]> <tibble [0 × 3]>
##  4 <split [135/15]> Fold04 <tibble [81 × 8]> <tibble [0 × 3]>
##  5 <split [135/15]> Fold05 <tibble [81 × 8]> <tibble [0 × 3]>
##  6 <split [135/15]> Fold06 <tibble [81 × 8]> <tibble [0 × 3]>
##  7 <split [135/15]> Fold07 <tibble [81 × 8]> <tibble [0 × 3]>
##  8 <split [135/15]> Fold08 <tibble [81 × 8]> <tibble [0 × 3]>
##  9 <split [135/15]> Fold09 <tibble [81 × 8]> <tibble [0 × 3]>
## 10 <split [135/15]> Fold10 <tibble [81 × 8]> <tibble [0 × 3]>
thematic::thematic_on(bg = 'black' , fg = 'white', accent = 'lightgreen' )
autoplot(mlp_reg_tune) + 
  scale_color_viridis_d(direction = -1) + 
  theme(legend.position = "top")

show_best(mlp_reg_tune) %>% select(-.estimator)
## # A tibble: 5 × 9
##   hidden_units penalty epochs num_comp .metric  mean     n std_err .config      
##          <int>   <dbl>  <int>    <int> <chr>   <dbl> <int>   <dbl> <chr>        
## 1           10       1     50        0 roc_auc     1    10       0 Preprocessor…
## 2           10       1    125        0 roc_auc     1    10       0 Preprocessor…
## 3           10       1    200        0 roc_auc     1    10       0 Preprocessor…
## 4           10       1     50       20 roc_auc     1    10       0 Preprocessor…
## 5           10       1    125       20 roc_auc     1    10       0 Preprocessor…

Time for this code chunk to run: 92.13 seconds

Regularization in machine learning is a technique used to prevent overfitting and enhance the generalization performance of models. Here’s what you need to know:

Role of Regularization

  1. Regularization adds a penalty term to the loss function during training. This discourages the model from assigning too much importance to individual features or coefficients.
    • It helps control model complexity, preventing overfitting to training data and improving generalization to new data.
    • By balancing bias and variance, regularization leads to better overall performance.
  2. We could repeat the same process if we wanted to model with gradient boosted trees. There are types of models where, from a single model fit, multiple tuning parameters can be evaluated without refitting. While not all models can exploit this feature, many broadly used ones do.
  3. Boosting models can typically make predictions across multiple values for the number of boosting iterations.
  4. Regularization methods, such as glmnet model, can make simultaneous predictions across the amount of regularization used to fit the model.
  5. Multivariate adaptive regression splines (MARS) adds a set of nonlinear features to linear regression models (Friedman 1991). The number of terms to retain is a tuning parameter, and it is computationally fast to make predictions across many values of this parameter from a single model
  6. Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees.[1][2] When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest.[1] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arfbitrary differentiable loss function

Gradient Boosting Models

# Load necessary libraries
library(xgboost);library(Matrix)

# Load the iris dataset
data(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
iris$Species <- base::as.numeric(iris$Species) - 1

# Split data into features and target
X <- as.matrix(iris[, -5])
y <- iris$Species

# Convert to DMatrix
dtrain <- xgb.DMatrix(data = X, label = y)

# Set parameters for XGBoost
params <- list(
  booster = "gbtree",
  objective = "multi:softprob",
  eval_metric = "mlogloss",
  num_class = 3
)

# Train the model
bst_model <- xgb.train(
  params = params,
  data = dtrain,
  nrounds = 10,
  verbose = 0
)

# Extract tree information
tree_info <- xgb.model.dt.tree(model = bst_model)
head(tree_info)
##     Tree  Node     ID      Feature Split    Yes     No Missing    Quality
##    <int> <int> <char>       <char> <num> <char> <char>  <char>      <num>
## 1:     0     0    0-0 Petal.Length  2.45    0-1    0-2     0-1 72.2967682
## 2:     0     1    0-1         Leaf    NA   <NA>   <NA>    <NA>  0.4306220
## 3:     0     2    0-2         Leaf    NA   <NA>   <NA>    <NA> -0.2200489
## 4:     1     0    1-0 Petal.Length  2.45    1-1    1-2     1-1 18.0741920
## 5:     1     1    1-1         Leaf    NA   <NA>   <NA>    <NA> -0.2153110
## 6:     1     2    1-2  Petal.Width  1.75    1-3    1-4     1-3 41.9078407
##       Cover
##       <num>
## 1: 66.66666
## 2: 22.22222
## 3: 44.44444
## 4: 66.66666
## 5: 22.22222
## 6: 44.44444
# Plot the first tree
xgb.plot.tree(model = bst_model, trees = c(0,1,2))
# Plot the first tree
xgb.plot.tree(model = bst_model, trees = c(4,7,8))
# Plot the first tree
xgb.plot.tree(model = bst_model, trees = c(13,14,21))

Time for this code chunk to run: 1.1 seconds

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees.[1][2] When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest.[1] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function

Let’s build a recipe using GBM

library(usemodels); library(C50); library(magrittr); library(caret)
library(tidymodels,quietly = T)

 # important to have reproducible results
set.seed(19)

use_xgboost(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris,
            verbose = TRUE)
## xgboost_recipe <- 
##   recipe(formula = Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, 
##     data = iris) %>% 
##   step_zv(all_predictors()) 
## 
## xgboost_spec <- 
##   boost_tree(trees = tune(), min_n = tune(), tree_depth = tune(), learn_rate = tune(), 
##     loss_reduction = tune(), sample_size = tune()) %>% 
##   set_mode("classification") %>% 
##   set_engine("xgboost") 
## 
## xgboost_workflow <- 
##   workflow() %>% 
##   add_recipe(xgboost_recipe) %>% 
##   add_model(xgboost_spec) 
## 
## set.seed(31722)
## xgboost_tune <-
##   tune_grid(xgboost_workflow, resamples = stop("add your rsample object"), grid = stop("add number of candidate points"))
c5_spec <- 
  boost_tree(trees = tune()) %>% 
  set_engine("C5.0") %>% 
  set_mode('classification')

c50Grid <- expand.grid(.trials = c(1:9, (1:10)*10),
                       .model = c("tree", "rules"),
                       .winnow = c(TRUE, FALSE))

c50Grid
##    .trials .model .winnow
## 1        1   tree    TRUE
## 2        2   tree    TRUE
## 3        3   tree    TRUE
## 4        4   tree    TRUE
## 5        5   tree    TRUE
## 6        6   tree    TRUE
## 7        7   tree    TRUE
## 8        8   tree    TRUE
## 9        9   tree    TRUE
## 10      10   tree    TRUE
## 11      20   tree    TRUE
## 12      30   tree    TRUE
## 13      40   tree    TRUE
## 14      50   tree    TRUE
## 15      60   tree    TRUE
## 16      70   tree    TRUE
## 17      80   tree    TRUE
## 18      90   tree    TRUE
## 19     100   tree    TRUE
## 20       1  rules    TRUE
## 21       2  rules    TRUE
## 22       3  rules    TRUE
## 23       4  rules    TRUE
## 24       5  rules    TRUE
## 25       6  rules    TRUE
## 26       7  rules    TRUE
## 27       8  rules    TRUE
## 28       9  rules    TRUE
## 29      10  rules    TRUE
## 30      20  rules    TRUE
## 31      30  rules    TRUE
## 32      40  rules    TRUE
## 33      50  rules    TRUE
## 34      60  rules    TRUE
## 35      70  rules    TRUE
## 36      80  rules    TRUE
## 37      90  rules    TRUE
## 38     100  rules    TRUE
## 39       1   tree   FALSE
## 40       2   tree   FALSE
## 41       3   tree   FALSE
## 42       4   tree   FALSE
## 43       5   tree   FALSE
## 44       6   tree   FALSE
## 45       7   tree   FALSE
## 46       8   tree   FALSE
## 47       9   tree   FALSE
## 48      10   tree   FALSE
## 49      20   tree   FALSE
## 50      30   tree   FALSE
## 51      40   tree   FALSE
## 52      50   tree   FALSE
## 53      60   tree   FALSE
## 54      70   tree   FALSE
## 55      80   tree   FALSE
## 56      90   tree   FALSE
## 57     100   tree   FALSE
## 58       1  rules   FALSE
## 59       2  rules   FALSE
## 60       3  rules   FALSE
## 61       4  rules   FALSE
## 62       5  rules   FALSE
## 63       6  rules   FALSE
## 64       7  rules   FALSE
## 65       8  rules   FALSE
## 66       9  rules   FALSE
## 67      10  rules   FALSE
## 68      20  rules   FALSE
## 69      30  rules   FALSE
## 70      40  rules   FALSE
## 71      50  rules   FALSE
## 72      60  rules   FALSE
## 73      70  rules   FALSE
## 74      80  rules   FALSE
## 75      90  rules   FALSE
## 76     100  rules   FALSE
set.seed(19)

c5_fit <- suppressWarnings( 
                           train( Species ~ .,
                           data = iris_Train,
                           method = "C5.0",
                           tuneGrid = c50Grid,
                           trControl = trainControl(),
                           metric = "Accuracy",
                           importance = TRUE, 
                           preProc = c("center", "scale"))  
                          )

show_notes(.Last.tune.result)
## Great job! No notes to show.
predictors(c5_fit)
## [1] "Petal.Length"
summary(c5_fit) 
## 
## Call:
## (function (x, y, trials = 1, rules = FALSE, weights = NULL, control
##  = FALSE, sample = 0, earlyStopping = TRUE, label = "outcome", seed =
##  1442L), importance = TRUE)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Fri Jul 12 03:54:04 2024
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 75 cases (5 attributes) from undefined.data
## 
## 3 attributes winnowed
## Estimated importance of remaining attributes:
## 
##    2500%  Petal.Length
## 
## -----  Trial 0:  -----
## 
## Decision tree:
## 
## Petal.Length <= -1.059934: setosa (25)
## Petal.Length > -1.059934:
## :...Petal.Length <= 0.573912: versicolor (24)
##     Petal.Length > 0.573912: virginica (26/1)
## 
## -----  Trial 1:  -----
## 
## Decision tree:
## 
## Petal.Length <= -1.059934: setosa (18.8)
## Petal.Length > -1.059934:
## :...Petal.Length <= 0.7429305: versicolor (42.6/5.3)
##     Petal.Length > 0.7429305: virginica (13.6)
## 
## *** boosting abandoned (too few classifiers)
## 
## 
## Evaluation on training data (75 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##       3    1( 1.3%)   <<
## 
## 
##     (a)   (b)   (c)    <-classified as
##    ----  ----  ----
##      25                (a): class setosa
##            24     1    (b): class versicolor
##                  25    (c): class virginica
## 
## 
##  Attribute usage:
## 
##  100.00% Petal.Length
## 
## 
## Time: 0.0 secs
print(c5_fit )
## C5.0 
## 
## 75 samples
##  4 predictor
##  3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## Pre-processing: centered (4), scaled (4) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 75, 75, 75, 75, 75, 75, ... 
## Resampling results across tuning parameters:
## 
##   model  winnow  trials  Accuracy   Kappa    
##   rules  FALSE     1     0.9356369  0.9020211
##   rules  FALSE     2     0.9468418  0.9189541
##   rules  FALSE     3     0.9426944  0.9127059
##   rules  FALSE     4     0.9469801  0.9191674
##   rules  FALSE     5     0.9412658  0.9105012
##   rules  FALSE     6     0.9455515  0.9169627
##   rules  FALSE     7     0.9426944  0.9127059
##   rules  FALSE     8     0.9426944  0.9127059
##   rules  FALSE     9     0.9426944  0.9127059
##   rules  FALSE    10     0.9426944  0.9127059
##   rules  FALSE    20     0.9426944  0.9127059
##   rules  FALSE    30     0.9426944  0.9127059
##   rules  FALSE    40     0.9426944  0.9127059
##   rules  FALSE    50     0.9426944  0.9127059
##   rules  FALSE    60     0.9426944  0.9127059
##   rules  FALSE    70     0.9426944  0.9127059
##   rules  FALSE    80     0.9426944  0.9127059
##   rules  FALSE    90     0.9426944  0.9127059
##   rules  FALSE   100     0.9426944  0.9127059
##   rules   TRUE     1     0.9465171  0.9185157
##   rules   TRUE     2     0.9479986  0.9207414
##   rules   TRUE     3     0.9438962  0.9145538
##   rules   TRUE     4     0.9392644  0.9075463
##   rules   TRUE     5     0.9396791  0.9081496
##   rules   TRUE     6     0.9392644  0.9075463
##   rules   TRUE     7     0.9396791  0.9081496
##   rules   TRUE     8     0.9448001  0.9159041
##   rules   TRUE     9     0.9396791  0.9081496
##   rules   TRUE    10     0.9396791  0.9081496
##   rules   TRUE    20     0.9422598  0.9120521
##   rules   TRUE    30     0.9435501  0.9140078
##   rules   TRUE    40     0.9435501  0.9140078
##   rules   TRUE    50     0.9435501  0.9140078
##   rules   TRUE    60     0.9435501  0.9140078
##   rules   TRUE    70     0.9435501  0.9140078
##   rules   TRUE    80     0.9435501  0.9140078
##   rules   TRUE    90     0.9435501  0.9140078
##   rules   TRUE   100     0.9435501  0.9140078
##   tree   FALSE     1     0.9356369  0.9020211
##   tree   FALSE     2     0.9455515  0.9170136
##   tree   FALSE     3     0.9426944  0.9127059
##   tree   FALSE     4     0.9426944  0.9127059
##   tree   FALSE     5     0.9412658  0.9105012
##   tree   FALSE     6     0.9455515  0.9169627
##   tree   FALSE     7     0.9426944  0.9127059
##   tree   FALSE     8     0.9426944  0.9127059
##   tree   FALSE     9     0.9426944  0.9127059
##   tree   FALSE    10     0.9426944  0.9127059
##   tree   FALSE    20     0.9426944  0.9127059
##   tree   FALSE    30     0.9426944  0.9127059
##   tree   FALSE    40     0.9426944  0.9127059
##   tree   FALSE    50     0.9426944  0.9127059
##   tree   FALSE    60     0.9426944  0.9127059
##   tree   FALSE    70     0.9426944  0.9127059
##   tree   FALSE    80     0.9426944  0.9127059
##   tree   FALSE    90     0.9426944  0.9127059
##   tree   FALSE   100     0.9426944  0.9127059
##   tree    TRUE     1     0.9465171  0.9185157
##   tree    TRUE     2     0.9492486  0.9226377
##   tree    TRUE     3     0.9438962  0.9145538
##   tree    TRUE     4     0.9392644  0.9075463
##   tree    TRUE     5     0.9396791  0.9081496
##   tree    TRUE     6     0.9392644  0.9075463
##   tree    TRUE     7     0.9396791  0.9081496
##   tree    TRUE     8     0.9435501  0.9140078
##   tree    TRUE     9     0.9396791  0.9081496
##   tree    TRUE    10     0.9353934  0.9016881
##   tree    TRUE    20     0.9422598  0.9120521
##   tree    TRUE    30     0.9435501  0.9140078
##   tree    TRUE    40     0.9435501  0.9140078
##   tree    TRUE    50     0.9435501  0.9140078
##   tree    TRUE    60     0.9435501  0.9140078
##   tree    TRUE    70     0.9435501  0.9140078
##   tree    TRUE    80     0.9435501  0.9140078
##   tree    TRUE    90     0.9435501  0.9140078
##   tree    TRUE   100     0.9435501  0.9140078
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 2, model = tree and winnow
##  = TRUE.

Time for this code chunk to run: 10.29 seconds

We had one error out of 75 classifying a versicolor as a virginica. Compare that with the results from the very first NN model displayed at the very begging where we had 3 errors classifying a versicolor as a virginica & classifying a 2 virginica’s as a versicolor’s.

It can get tedious running all the different kinds of models. However H2O supports two types of grid search – traditional (or “cartesian”) grid search and random grid search. In a Cartesian grid search, users specify a set of values for each hyper-parameter that they want to search over, and H2O will train a model for every combination of the hyper-parameter values. In random grid search, the user specifies the hyper-parameter space in the exact same way, except H2O will sample uniformly from the set of all possible hyper-parameter value combinations. In random grid search, the user also specifies a stopping criterion, which controls when the random grid search is completed.

Continued on DATA SCIENCE, ARCHITECTURE, AI and ML Part II